Frame Skip Is a Powerful Parameter for Learning to Play Atari

نویسندگان

  • Alexander Braylan
  • Mark Hollenbeck
  • Elliot Meyerson
  • Risto Miikkulainen
چکیده

We show that setting a reasonable frame skip can be critical to the performance of agents learning to play Atari 2600 games. In all of the six games in our experiments, frame skip is a strong determinant of success. For two of these games, setting a large frame skip leads to state-of-the-art performance. The rate at which an agent interacts with its environment may be critical to its success. In the Arcade Learning Environment (ALE) (Bellemare et al. 2013) games run at sixty frames per second, and agents can submit an action at every frame. Frame skip is the number of frames an action is repeated before a new action is selected. Existing reinforcement learning (RL) approaches use static frame skip: HNEAT (Hausknecht et al. 2013) uses a frame skip of 0; DQN (Mnih et al. 2013) uses a frame skip of 2-3; SARSA and planning approaches (Bellemare et al. 2013) use a frame skip of 4. When action selection is computationally intensive, setting a higher frame skip can significantly decrease the time it takes to simulate an episode, at the cost of missing opportunities that only exist at a finer resolution. A large frame skip can also prevent degenerate super-human-reflex strategies, such as those described by Hausknecht et al. for Bowling, Kung Fu Master, Video Pinball and Beam Rider. We show that in addition to these advantages agents that act with high frame skip can actually learn faster with respect to the number of training episodes than those that skip no frames. We present results for six of the seven games covered by Mnih et al.: three (Beam Rider, Breakout and Pong) for which DQN was able to achieve nearor superhuman performance, and three (Q*Bert, Space Invaders and Seaquest) for which all RL approaches are far from human performance. These latter games were understood to be difficult because they require ‘strategy that extends over long time scales.’ In our experiments, setting a large frame skip was critical to achieving state-of-the-art performance in two of these games: Space Invaders and Q*Bert. More generally, the frame skip parameter was a strong determinant of performance in all six games. Our learning framework is a variant of Enforced Subpopulations (ESP) (Gomez and Miikkulainen 1997), a neuroevolution approach that has been successfully impleCopyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. mented and extended to train agents for a variety of complex behaviors and control tasks (Gomez and Schmidhuber 2005; Schmidhuber et al. 2007, e.g.). In contrast to conventional neuroevolution (CNE) which evolves networks directly, ESP maintains a distinct population of neurons for each hidden node in the network, which enables hidden nodes to coevolve to take on complementary roles. ESP can also add hidden nodes to provide a boost when learning stagnates. In the experiments below, all networks are feedforward. The input layer is the object representation introduced by Hausknecht et al. The output layer has one node for each of the nine joystick positions and one indicating whether or not to fire. For each game we trained agents at four frame skips: 0, 20, 60 and 180. For each of these 24 setups, to maintain comparison to Hausknecht et al., we averaged scores over five runs, simulated 100 episodes per generation, and capped each episode at 50,000 frames. To further speed up training, on all games except Seaquest (which has particularly sparse rewards) we stop agents when they have not received a positive reward in 30 game seconds. Each run lasts 200 generations. The score of a run at a given generation is the highest total reward an agent has achieved in an episode by that generation. Figure 1 depicts the training progress for each setup. ESP performed better with a high frame skip for Beam Rider, Q*Bert, Seaquest and Space Invaders. Seaquest achieved top performance when skipping 180 frames, that is, when pausing for a full three seconds between decisions. Space Invaders and Beam Rider achieved their top performance when skipping 60 frames. Agents that use high frame skip do not learn action selection for states that are skipped, and thus have a greater capacity to learn associations between more temporally distant states and actions. This could help deal with the non-Markovian nature of some of these games. For example, as noted by Mnih et al., in Space Invaders lasers are not visible every fourth frame. If an agent only commits to longterm actions when lasers are visible, it will not be confused by this peculiarity. These longerterm decisions also lead to broader exploration of the behavior space resulting in increased population diversity. On the other hand, it is not surprising that Pong and Breakout perform best with low frame skip, since these games require fine actions to reach every state necessary to block the ball. For Pong, the performance difference would be even larger if we did not stop agents that did not score for 30 seconds. 10 Learning for General Competency in Video Games: Papers from the 2015 AAAI Workshop

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Frame skip Deep Q Network

Deep Reinforcement Learning methods have achieved state of the art performance in learning control policies for the games in the Atari 2600 domain. One of the important parameters in the Arcade Learning Environment (ALE, [Bellemare et al., 2013]) is the frame skip rate. It decides the granularity at which agents can control game play. A frame skip value of k allows the agent to repeat a selecte...

متن کامل

Skip Context Tree Switching

Context Tree Weighting is a powerful probabilistic sequence prediction technique that efficiently performs Bayesian model averaging over the class of all prediction suffix trees of bounded depth. In this paper we show how to generalize this technique to the class of K-skip prediction suffix trees. Contrary to regular prediction suffix trees,K-skip prediction suffix trees are permitted to ignore...

متن کامل

CS229 Final Report Deep Q-Learning to Play Mario

In this paper, I study applying applying and adjusting DeepMind’s Atari Deep Q-Learning model to train an automatic agent to play the 1985 Nintendo game Super Mario Bros. The agent learns control policies from raw pixel data using deep reinforcement learning. The model is a convolutional neural network that trained through only raw frames of the game and basic info such as score and motion.

متن کامل

Deep Apprenticeship Learning for Playing Video Games

Recently it has been shown that deep neural networks can learn to play Atari games by directly observing raw pixels of the playing area. We show how apprenticeship learning can be applied in this setting so that an agent can learn to perform a task (i.e. play a game) by observing the expert, without any explicitly provided knowledge of the game’s internal state or objectives. Background Mnih et...

متن کامل

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015